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Fig. 1. We synthesize multi-scattered illumination in clouds using deep radiance-predicting neural networks (RPNN). We combine Monte Carlo integration 
with data-driven radiance predictions, accurately reproducing edge-darkening effects (left), silverlining (right), and the whiteness of the inner part of the cloud. 


We present a technique for efficiently synthesizing images of atmospheric 
clouds using a combination of Monte Carlo integration and neural net- 
works. The intricacies of Lorenz-Mie scattering and the high albedo of 
cloud-forming aerosols make rendering of clouds—e.g. the characteristic 
silverlining and the “whiteness” of the inner body—challenging for methods 
based solely on Monte Carlo integration or diffusion theory. We approach 
the problem differently. Instead of simulating all light transport during ren- 
dering, we pre-learn the spatial and directional distribution of radiant flux 
from tens of cloud exemplars. To render a new scene, we sample visible 
points of the cloud and, for each, extract a hierarchical 3D descriptor of the 
cloud geometry with respect to the shading location and the light source. 
The descriptor is input to a deep neural network that predicts the radiance 
function for each shading configuration. We make the key observation that 
progressively feeding the hierarchical descriptor into the network enhances 
the network’s ability to learn faster and predict with higher accuracy while 
using fewer coefficients. We also employ a block design with residual con- 
nections to further improve performance. A GPU implementation of our 
method synthesizes images of clouds that are nearly indistinguishable from 
the reference solution within seconds to minutes. Our method thus repre- 
sents a viable solution for applications such as cloud design and, thanks to 
its temporal stability, for high-quality production of animated content. 
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1 INTRODUCTION 


Efficient and accurate rendering of high-albedo materials is a chal- 
lenging problem, especially when the optical properties vary spa- 
tially. Heterogeneous densities of aerosols formed into visible struc- 
tures, e.g. atmospheric clouds, is one such class of materials that 
challenge the efficiency of rendering algorithms. The high reflec- 
tivity of liquid droplets and frozen crystals produces appearances 
dominated by multiple scattering of light, often on the order of 
thousands of photon-matter interactions. Even if the discrete nature 
is ignored and the cloud is approximated by a continuous volume, 
estimating light transport by solving the radiative transfer equation 
(RTE) [Chandrasekhar 1960] is computationally intensive. 

Many numerical recipes for synthesizing photorealistic images 
are based on Monte Carlo (MC) integration. While these techniques 
have been successfully applied to volume rendering, and despite 
the theory naturally extends to multi-dimensional problems, MC 
rendering of atmospheric clouds faces two challenges. Firstly, the 
space of paths that transport non-negligible contributions is many- 
dimensional, requiring many samples to reduce estimation variance. 
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(a) 1 interaction 
Render time = 0.017 h 


(c) 16 interactions (d) 64 interactions t) All interactions 
h 6h h 


(b) 4 interactions i 


Render time = 0.89 Render time = 6.5 ender time = 16 ender time = 34 


Fig. 2. Multiply-scattered light transport is paramount for synthesis of realistic images of clouds. We show renderings of the same cloud under equal 
illumination where up to 1, 4, 16, 64, and arbitrarily many light-matter interactions are considered per light path. Because the albedo of clouds is close to 1, 
very long chains of light-matter interactions still carry significant amounts of energy and thus need to be considered, resulting in high render times. For each 


configuration we report the render time it takes to converge to the variance of the right-most image. 


Secondly, the construction of individual path samples is expensive 
even with sophisticated techniques for free-path sampling [Kutz 
etal. 2017; Szirmay-Kalos et al. 2017] and advanced acceleration data 
structures [Szirmay-Kalos et al. 2011; Yue et al. 2010]. Rendering 
realistic clouds with path-tracing algorithms thus requires tens of 
hours of computation to obtain acceptable error; see Figure 2. 

While such cost may be acceptable in some scenarios, interac- 
tive and throughput-oriented applications, such as asset design 
and movie production, require frames to be computed orders of 
magnitude faster. These typically employ approximations based on 
diffusion theory [Koerner et al. 2014; Stam 1995], density estima- 
tion [Elek et al. 2012], or semi-analytic solutions [Bouthors et al. 
2008] heavily reducing visual quality in favor of fast rendering. We 
similarly trade accuracy for higher speed, but tackle the problem by 
synthesizing high-order scattering using neural networks. 

Deep neural networks consist of compositions of simple building 
blocks that, when combined in the right way, achieve state-of-the- 
art performance in a wide variety of computational domains [LeCun 
et al. 2015]. Their application to transport problems is however 
still relatively unexplored. We demonstrate that, when designed 
carefully, these networks can accurately predict the distribution of 
radiance in clouds due to unoccluded distant lighting upon arbitrar- 
ily many scattering events. 

Since light propagation is a non-local phenomenon, the prediction 
speed hinges on the ability to concisely express the cloud geometry 
that is relevant to a given shading configuration. To that end, we 
employ a hierarchical point stencil that accurately captures the local 
neighborhood and compactly approximates remote regions and the 
overall bulk of the cloud. We observe that feeding the stencil to the 
network all-at-once yields poor results. Instead, we propose to input 
the hierarchy progressively, with each block of the network having 
direct access to at most one level of the hierarchy. Such design drives 
the network to reflect on the multi-scale nature of light transport 
and reduces the number of trainable parameters. This is key for 
performing many fast predictions needed to render the image. 

We compare our method to reference path-traced images and the 
state of the art in grid-resolved diffusion theory. Our results are 
significantly closer to reference images and render as fast as the 
other approximate method. 
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2 RELATED WORK 


Monte Carlo Methods. Kajiya and Von Herzen [1984] were the 
first to use path tracing for numerically estimating radiative transfer 
in volumes [Chandrasekhar 1960]. Their technique was later ex- 
tended by constructing paths in a bidirectional manner [Lafortune 
and Willems 1996], mutating paths using the Metropolis-Hastings 
algorithm [Pauly et al. 2000], and importance sampling of low-order 
scattering [Georgiev et al. 2013; Kulla and Fajardo 2012]. The effi- 
ciency of these algorithms can be increased by reusing computa- 
tion and correlating estimates as in radiance caching [Jarosz et al. 
2008], many-light rendering [Novák et al. 2012; Raab et al. 2008] and 
density-estimation-based methods [Jarosz et al. 2011, 2008; Jensen 
and Christensen 1998]. Křivánek et al. [2014] formulated a uni- 
fied theory for these seemingly incompatible approaches, allow- 
ing to combine their strengths in a robust estimator. Bitterli and 
Jarosz [2017] generalized volumetric density estimation to arbitrary- 
dimensional samples. Nevertheless, even with free paths and trans- 
mittance estimates constructed using advanced approaches [Kutz 
et al. 2017; Novák et al. 2014; Szirmay-Kalos et al. 2017] operating 
on special data structures [Szirmay-Kalos et al. 2011; Yue et al. 2011, 
2010], these methods are far from reaching interactive frame rates 
when used on the highly scattering materials that we target. 


Diffusion Theory. Certain situations permit approximating mul- 
tiple scattering with the diffusion theory, in graphics pioneered 
by Stam [1995] who rendered heterogeneous media by solving a 
discretized form of the diffusion equation on a grid. Koerner et al. 
[2014] modulate the diffusion coefficient to increase accuracy in 
regions with low density and/or high albedo. Several semi-analytic 
solutions based on combining transport contributions from two or 
more monopoles were developed for fast simulation of subsurface 
scattering [d’Eon and Irving 2011; Donner and Jensen 2005; Frisvad 
et al. 2014; Jensen et al. 2001]. Unfortunately, none of the diffusion 
approaches handle the highly anisotropic scattering in clouds well, 
and the appearance stemming from low- and mid-order scattering, 
e.g. silverlining, is not reproduced well. 


Tabulated Multiple Scattering. The anisotropy can be better pre- 
served when using precomputation-based solutions. Szirmay-Kalos 
et al. [2005] trace a number of light paths to build illumination net- 
works and reevaluate the transport on the fly achieving real-time 
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(b) Henyey-Greenstein, g = 0.857 
Equal-time variance = 698 


(c) Lorenz-Mie (LM) 
Equal-time variance = 23136 


(d) Chopped LM 
Equal-time variance = 136 


(e) Chopped LM after 1 bounce 
Equal-time variance = 144 


(a) Isotropic phase function 
Equal-time variance = 17.7 


Fig. 3. Front-lit (top) and back-lit (bottom) cloud from Figure 2 rendered with different phase functions. Isotropic scattering (a) reduces contrast and fails 
to synthesize silverlining. The Henyey-Greenstein approximation (b) incorrectly increases contrast (top) and produces a different silverlining effect than 
Lorenz-Mie (LM) scattering (c). Chopping the peak (d) leads to lower estimation variance, but also removes the strong silverlining. Using the complete LM 


phase function at the first bounce only and then switching to the chopped version provides nearly perfect result and low estimation variance (e). 


frame rates. Moon et al. [2007] proposed to tabulate scattered ra- 
diance on spherical shells surrounding a directional point emitter 
embedded in a granular material. An analogous approach was used 
by Lee and O’Sullivan [2007] to handle homogeneous continuous 
volumes, and further adapted to heterogeneous media by Müller 
et al. [2016]. All these methods are either interactive, or produce 
high-fidelity images, but none of them achieve both concurrently. 


Specialized Methods for Rendering Clouds. Fast modeling and ren- 
dering of clouds was first addressed by Gardner [1985] who used 
ellipsoids with spatially varying densities obtained as a superpo- 
sition of sine waves. Real-time rates can be achieved on graphics 
hardware by rasterizing particles [Harris and Lastra 2001], meta- 
balls [Dobashi et al. 2000], or volume slices [Riley et al. 2004] into 
billboard textures and projecting them onto the image plane. At 
best, these approaches capture only a limited amount of (anisotropic) 
scattering; backscattering is often completely ignored. Neglecting 
high-order scattering, as proposed by Nishita et al. [1996], leads 
to poor results (according to current standards) in optically thick 
clouds; see Figure 2. 

Bouthors et al. [2008] acknowledge the anisotropy by employing 
Lorenz-Mie scattering. We also adopt Lorenz-Mie scattering with 
the assumption of spherical scatters; non-spherical scatterers have 
been studied by Frisvad et al. [2007]. 

Bouthors et al. [2008] further propose to precompute solutions for 
flat slabs that are then fitted into the cloud geometry achieving real- 
time frame rates. Instead of slabs, Elek et al. [2012] record photons in 
a regular grid where each cell models the angular distribution using 
the Henyey-Greenstein (HG) basis. In a follow-up [Elek et al. 2014], 
the authors increased the performance by combining the HG basis 
with the discrete-ordinates method. Nevertheless, for albedos close 
to 1 the approach requires either excessive numbers of iterations 
or multiple super-imposed low-resolution grids that tend to blur 
high-frequency illumination patterns. 

Our goal is to accurately synthesize all-order, anisotropic scat- 
tering and accurately capture the appearance of the dense inner 
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part and wispy boundaries. We explore a new approach based on 
approximating the cloud geometry by a hierarchical descriptor and 
predict local illumination using a deep neural network. 


Neural Networks. Deep neural networks (see Bengio et al. [2013]; 
LeCun et al. [2015] for a comprehensive review) are able to efficiently 
model complex relationships between input and output variables in 
a highly non-linear manner. This data-driven approach has emerged 
as a state-of-the-art technology in a wide variety of challenging 
problems, e.g. image recognition [He et al. 2016; Simonyan and 
Zisserman 2014], machine translation [Wu et al. 2016], or generative 
modeling of raw audio and natural images [Oord et al. 2016a,b]. 

Deep learning has also been successfully applied to problems in 
computer graphics. Nalbach et al. [2017] use convolutional neural 
networks (CNNs) to synthesize ambient occlusion, illumination, 
and other effects in screen space. Bako et al. [2017] and Chaitanya 
et al. [2017] employ neural networks for denoising rendered images. 
These approaches operate in 2D, using color and feature images as 
inputs to the network. Similarly to Chu and Thuerey [2017], who 
apply CNNs to fluid simulation, our networks operate in 3D. 

Closely related to ours is the work by Ren et al. [2013], who train 2- 
hidden-layer MLPs to predict radiance in a specific region of a scene. 
In contrast, our 23-layer MLPs are applicable to an entire spectrum 
of atmospheric clouds. We use a hierarchical feature and feed its 
levels into the network progressively. While the first provides a 
cost-effective extraction of global information, the latter keeps the 
number of trainable parameters low facilitating fast predictions. 


3 RADIATIVE TRANSFER IN CLOUDS 


In this section, we discuss properties of clouds that are key for 
synthesizing their appearance correctly (see Figure 3). We then 
briefly review the RTE to allow defining the quantity to be predicted 
and describing our approach in the following section. 
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3.1 Properties of Clouds 


Clouds can be primarily categorized as a cirrus (thin, wispy curls), 
a stratus (hazy, featureless sheets), or a cumulus (fluffy heaps). The 
last type is the most challenging to render due to its high-frequency 
boundary details and the smooth bright appearance of the inner 
body stemming from many interactions that light undergoes inside. 

The aerosols forming a cloud consist of many small water parti- 
cles—droplets or ice crystals—the placement and size of which can 
be well represented statistically. The radii of droplets in a cumulus 
cloud range from 1 to 100 microns with means typically around 
5 u [Weickmann and Kampe 1953]. The droplet-size distribution 
function (DSD) has a key impact on the angular distribution of scat- 
tered light, and can be well approximated by a gamma function. 
Because these droplets absorb little light, long chains of light-matter 
interactions are an integral part of the characteristic appearance of 
clouds (see Figure 2). Since the scatterers are comparable in size to 
the wavelength of visible light, the distribution of scattered light, 
represented by phase function p, needs to be derived from Maxwell’s 
equations. Lorenz [1890] and Mie [1908] developed solutions for 
planar waves interacting with a homogeneous set of spheres. The 
resulting phase function exhibits a very strong diffraction peak, 
a wide forward-scattering lobe, a back-scattering circular ridge, 
and a back-scattering peak; see Figure 4 for an example and the 
supplementary material for details. 

As shown in Figure 3, the intricate shape of the Lorenz-Mie phase 
function produces characteristic visual effects such as silverlining, 
fogbow, and glory, which are lost or synthesized inaccurately when 
using an isotropic or Henyey-Greenstein approximation. At the 
same time, however, using the high-frequency, multi-modal Lorenz- 
Mie distribution makes it difficult to sample the high-energy, multi- 
scattered transport efficiently; too difficult to render our training 
data within reasonable time. We follow the suggestion of Bouthors 
et al. [2008] to “chop” the diffraction peak and lower the cloud den- 
sity according to the fraction of scattered light contained in the peak. 
The explicit simulation of near-perfect forward scattering—the main 
source of sampling difficulties—is thus removed and accounted for 
implicitly by reducing the optical thickness; see the supplementary 
material for details. 

As shown in Figure 3(d), the chopped Lorenz-Mie reduces estima- 
tion variance and produces accurate results in front-lit or side-lit 
configurations. However, since the slight angular blurring due to the 
diffraction peak is replaced by continuing in the perfectly forward 
direction, the very prominent effect of silverlining in backlit config- 
urations is nearly completely lost. Since the silverlining stems from 
low-order scattering, we opt to use the full Lorenz-Mie phase func- 
tion for the first bounce and then switch to the chopped version and 
its corresponding optically thinner volume. This preserves the effect 
of silverlining and provides sufficient performance for creating a 
large set of ground-truth simulations to train our networks. 


3.2 Radiative Transfer 


As in other works, we approximate the discrete aerosol mixture by 
a continuous medium, parameterized by a spatially varying extinc- 
tion coefficient +(x), a globally constant albedo a, and an axially 
symmetric phase function p(cos 0). In such a medium, the change 
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Fig. 4. Isotropic (green), Henyey-Greenstein with g = 0.857 (red), and 
Lorenz-Mie (dark blue) phase functions. The light blue shows an unnormal- 
ized version of Lorenz-Mie with a chopped diffraction peak. 


in radiance L at point x in direction w is governed by the radiative 
transfer equation: 


(@ - V)L(x, @) = = He CLK, w) + Us) | , PO : @)L(x, @) da, 


Integrating both sides of the differential RTE along w yields the 
following Fredholm integral equation of the second kind: 


L(x, @) -[ exp (-[ p(s) psu) f plo - O)L(Xy, ©)d@ du, 


where x, = x — uw and analogously for x,. The exponential term 
represents the attenuation of light on a straight line between x and 
Xu commonly referred to as transmittance T(x, Xu). 

Since clouds are non-emissive and light can only be injected 
from the outside, we need to define a proper boundary condition. 
Assuming x, is on a hypothetical boundary Q that encloses the 
cloud, i.e. Vx, € Q : u+(xp) = 0, we have 


b 
L(x, w) =f T(x, xusta) f plo - ©)L(xu, ©) d@ du 
0 s2 
+ T(x, xp)L(xp, @), (1) 


where b is the distance to the boundary. 

Equation (1) can be estimated by recursive Monte Carlo integra- 
tion: we alternate sampling of distances and directions, gradually 
building a path towards the boundary and evaluating the radiance 
thereof (e.g. by extending the path beyond Q). Even with the latest 
importance-sampling techniques for building volumetric paths, ren- 
dering high-albedo clouds by path tracing is too costly for interactive 
applications and we use it only for constructing the training set. 


4 RADIANCE-PREDICTING NEURAL NETWORKS 


We propose a supervised learning approach to sidestep the costly 
evaluation of the recursion in Equation (1). Our goal is to efficiently 
synthesize the in-scattered radiance 


Ls(x, @) = a plo - @)L(x, @) do. 


To that end, we considered learning and predicting the complete 
Ls(x, œ), but observed that the network struggles with predicting 
uncollided radiance, Lg(x, œ); i.e. the light that arrives from light 
sources directly and contributes to single scattering along camera 
rays. We thus opt for predicting only multi-scattered transport, 
precisely the indirect in-scattered radiance: 


Li(x, @) = J plo - @)(L(x, @) — Lg(x, @)) do. 
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(a) Stencil grid (b) Two levels (c) Local frame 


Fig. 5. The descriptor consists of 5 x 5 X 9 stencils (a) that are progressively 
scaled and turned into a hierarchy (b), which is then oriented towards the 
light source (c); only two levels of the hierarchy are shown here (red and 
blue) to avoid clutter. 


For the the rest of the transport we employ MC integration that 
samples visible points of the cloud (first integral in Equation (1)) 
and estimates single scattering thereof. 


4.1 Learning from Data 


The goal in supervised learning is to find a function g(z; 0) : RÍ >R, 
which maps a feature descriptor z to a target value, L;(x, œ) in our 
case, and depends on a set of parameters, 0. Since light transport 
is global, L;(x, w) is effectively a function of not only the location 
x and direction w, but also depends on the light source and the 
density structure of the entire cloud; we denote this set S and refer 
to it as “shading configuration”. We define an efficient descriptor 
Z=AP(S)E Rf to accurately represent S without excessive compu- 
tation and storage requirements; details follow in Section 4.2. 
Then, given a finite sample of labeled training data 


Dy = {(21, Li(x1, 01)),..., (zn, Likn, On))}, 


we aim to find the value of 0 that minimizes the average loss between 
the predictions and the target values, 


N 
as 1 
Oe mee N > €(g(zi; 0), Li(xi, wi)). (2) 


i=1 
We propose restricting g to a particular neural-network architecture 


that we describe in Section 4.3. 


4.2 Descriptor Construction 


The main challenge of defining a good descriptor z for a given con- 
figuration S is in capturing relevant information from the density 
of the cloud. In theory, a neural network should be able to extract 
all necessary information from raw data. However, we demonstrate 
that the number of trainable parameters can be significantly reduced 
by providing the network with an easy-to-compute feature descrip- 
tor. Managing the complexity of the network in this way has the 
benefit of speeding up the training procedure. More importantly 
however, it speeds up the predictions; millions of which are needed 
to synthesize a single frame. 

To represent small-scale detail near x, but also the overall shape 
of the cloud, we employ a hierarchy of point stencils that sample 
the cloud density. The stencils at individual levels are identical, 


but their support increases geometrically with scaling factor gka, 
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Fig. 6. We feed individual levels of the hierarchical stencil progressively. 
The upper part of the network consists of 2-layer blocks (blue) with residual 
connections, each of which processes the output of the previous block and 
one level of the stencil. We also add 3 standard layers (green) at the end. In 
all results we use K = 10, i.e. the complete network contains 23 layers. 


where k is the (1-based) index of the level; we use K = 10 levels in 
total. The stencil in the highest level covers a (2°)? z 108x larger 
volume than the stencil at k = 1. In order to make the descriptor 
scale independent, we express the stencil dimensions and density 
values in terms of the mean free path; we do so by scaling the cloud 
geometry and density such that one unit of distance corresponds 
to one mean free path within the average density of the cloud; the 
average is computed over all non-zero voxels. We refer to the scaled 
density as p. 

All stencils share the same local frame centered at x. The z-axis 
of the frame points towards the light source (e.g. the Sun or another 
distant emitter), the x-axis is perpendicular to the plane formed by 
the z-axis and o, and the y-axis is orthogonal to the two. The stencil 
at each level is formed by 225 points. Initially, these points form a 
5 x 5 x 9 grid in an axis-aligned box with [-1, —1,—1] and [1, 1, 3] 
being two opposing corners. The stencil at level k is then further 
scaled by 2*-}, 


see Figure 5 for an illustration. For each level k, 
we extract the density values rk = {o(ak). (a), vd -p(as,s)}. the 


union of which, = = eo =, represents the complete stencil. 
The hierarchical stencil describes the cloud density with respect 
to location x, direction towards the light source w);, and the plane 
that w; forms with w. We also add the angle y = cos !(w - %1) to 
complete the description of the configuration, i.e. z = {X, y}. 


4.3. Network architecture 


We use a network architecture based on a multilayer perceptron 
(MLP). In a standard MLP, each layer m = 1,..., M applies a linear 
transformation to the output of the previous layer followed by 
an element-wise nonlinear activation function to obtain the layer 
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output. We propose a modified architecture, tailored to the problem 
at hand, that yields better results than vanilla MLPs. We observe that 
the prediction quality improves if the individual levels of the stencil 
are fed into the network progressively, and that the performance 
further increases with residual connections [He et al. 2016]. 

The architecture is visualized in Figure 6. For each of the K levels 
of the hierarchical stencil ©, we construct a block structure that 
consists of two fully-connected layers which are bypassed using 
a residual connection. Each block k processes the corresponding 
stencil level =* and angle y, denoted z* = {=*, y}, and the output 
of the previous block o*~! as follows 


FE = f (vec! + wot + bt), 
k _ yy7k pk k 

FE = WERK + bk, 

ok = F (Fs +081) 5 


where V, W and b are trainable weights and biases, respectively. 
We use the rectified linear unit (ReLU) activation function f(a) = 
max(0, a). The residual connections serve two purposes: (i) they are 
necessary to train deeper networks efficiently [Balduzzi et al. 2017] 
and (ii) they encourage the network to learn how best to combine 
each layer in the input hierarchy with the information from previous 
levels in the stencil. 

We also add a number of standard fully connected layers of the 


form o” = f (wrom + b) to process the output of the last 
block. The output of the very final layer represents the predicted 
value, i.e. g(z; 0) = oM, where M is the total number of blocks and 
standard layers, and the set of parameters @ consists of!: 


k k k pk pki K M 
0 = {V", Wy, Wy, by, b3 tpi {WY bY mK: 


4.4 Rendering 


In order to synthesize images using our network, we integrate it 
into a Monte Carlo ray tracer. We stochastically sample the first 
scattering interaction with delta tracking [Woodcock et al. 1965] 
and estimate direct in-scattering via Monte Carlo and predict indi- 
rect in-scattering with our neural network. Our renderer supports 
arbitrary distant light sources, which can be approximated well by 
drawing a number of directional samples. The direct in-scattering 
is estimated using next-event estimation, i.e. by combining samples 
from the phase function and the light source via multiple importance 
sampling. To predict indirect in-scattering, we extract descriptor 
z of the shading configuration and query our network for L(x, œ). 
The direct and indirect components are added, multiplied by jus to 
obtain the absolute in-scattered radiance, and propagated back to 
the camera. By numerically integrating the predictions over the 
solid angle of the environment emitter, we can render clouds under 
environment illumination while having trained only for simpler, 
purely directional illumination conditions. 


5 TRAINING 


We train, validate, and test our networks on a mix of 80 procedu- 
rally generated, simulated, and artist-created clouds? represented 


1We keep w! only for equation brevity; it could be excluded since 0° = 0. 
2A subset is available on https://www.disneyresearch.com/publication/deep-scattering. 
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by voxel grids with resolutions within [100-1200]?. The geome- 
try of each cloud is uniformly scaled to fit into a unit box and the 
density scaled accordingly to preserve the look. For testing, we 
carefully handpicked 5 clouds to cover the spectrum of shapes and 
thicknesses; these clouds are not used for training or validation. The 
remaining 75 clouds were used to generate training and validation 
data. 


5.1 Data Generation 


To compute a single training/validation record (z;, Lj(xi, wi)), we 
sample a configuration S;, estimate the indirect in-scattered radi- 
ance L;(x;, wi) thereof, and extract the descriptor z;. The shading 
location (xj, wi) is picked by first sampling a random direction wj 
uniformly over the solid sphere. Next, a ray origin x is picked such 
that the cloud’s bounding sphere’s cross section is hit with a uni- 
form probability density by the ray parametrized by (x, w;). Lastly, 
we trace the ray into the cloud by sampling the free-flight distance 
to obtain point x;. If this step does not produce a position within 
the cloud, we repeat the entire process. This procedure ensures that 
the density of records is higher near the cloud “surface”, mimicking 
the distribution of records as found when path tracing, as is desired 
for the rendering scenario described in Section 4.4. 

In typical scenarios, clouds are illuminated primarily by the Sun 
and the environment. Since the Sun can be parameterized well by a 
single direction, and the illumination from the sky and the ground 
can be approximated as a superposition of multiple directional lights, 
we train only for directional light sources. We sample a random 
direction œ, from which light will arrive at the shading location, 
and then estimate the indirect in-scattered radiance L;(x;, wi) using 
standard MC path tracing with next-event estimation; we draw 
samples progressively until the sample mean lies within a +2% 
interval around the true answer (estimated with 95% confidence). 

To study the impact of the phase function, we generated two data 
sets, each with N = 15 million samples. The first data set uses the 
tabulated chopped Lorenz-Mie phase function and reduced density; 
the version with the peak is used only at the first bounce while 
rendering. The second set uses a Henyey-Greenstein phase function 
with g = 0.857, matching the mean cosine of the Lorenz-Mie phase 
function with the peak. 


5.2 Training Procedure 


We use the squared error loss function between the logarithm of the 
network output and the L;(x, w) estimated while generating data. 
Specifically, for a minibatch 8 the loss is 
1 2 
tg = — (10g (1 + g(zi;0)) — log (1 + Lilxi,0)))) 
B| icB 


We solve Equation (2) using stochastic gradient descent with the 
Adam update rule using the default learning rate, the decay param- 
eters reported by Kingma and Ba [2014], and minibatches of size 
IB] = 1000. 

Each of the two data sets described in Section 5.1 is split 75% : 25% 
between non-overlapping training and validation subsets. The first 
is used for training the parameters of the network and the second 
to monitor convergence and select hyperparameters. 
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(a) PT reference (b) RPNN (ours) (c)PT  (d) RPNN (e) PT reference (f) RPNN (ours) (g) PT (h) RPNN 


Fig. 7. Comparison of path-traced references and our approach that employs radiance-predicting neural networks (RPNN) on several different clouds 
illuminated by a sun-sky model. Our images accurately predict the incident illumination both due to the sun and the sky, which is estimated by sampling a 
number of light directions and evaluating our predictive model on them. When considering only sun-sky illumination, our method converges 24x faster than 
path tracing on average. Please see the supplementary material for additional results. 


J 


Side-lit CLouD 3 


Front-lit CLouD 4 


Back-lit CLoup 1 


mh. E. 


(a) PT reference (b) RPNN (ours) (c) Flux-limited Diffusion (d) PT reference (e) RPNN (f) FLD 


Fig. 8. Comparison of path-traced references, our method, and flux-limited diffusion (FLD) on three different clouds illuminated from three distinct directions. 
In contrast to FLD, our data-driven method correctly captures subtle contrast details (top), better preserves overall brightness of thin clouds (middle), and 
produces effects such as silverlining when trained on simulations of Lorenz-Mie scattering that FLD approximates by isotropic scattering. Columns on the 
right show zoom-ins. All FLD and RPNN images were rendered in approximately 6 seconds. 


2017-09-27 09:44 page 7 (pp. 1-11) ACM Transactions on Graphics, Vol. 36, No. 6, Article 231. Publication date: November 2017. 


231:8 


6 RESULTS 


We compare the results obtained with our data-driven method to 
reference images rendered with a volumetric path tracer, and approx- 
imate solutions obtained with flux-limited diffusion (FLD) [Koerner 
et al. 2014]; we use the authors’ GPU implementation. The five 
clouds shown in the figures here and in the supplemental materials 
were excluded from the training and validation sets. 


Implementation. We implemented our prediction model in Ten- 
sorFlow [Abadi et al. 2015], including the camera-ray generation, 
free-path sampling along primary rays, and single scattering estima- 
tion to allow for an efficient integration and execution on the GPU. 
For evaluating the densities, we constructed a Gaussian-filtered den- 
sity mipmap of the cloud to avoid aliasing artifacts and flickering 
in animated sequences. We used oj = 21, where i is the mip level, 
as the standard deviation of the Gaussian filters. All our and FLD 
results were obtained using an NVIDIA Titan X (Pascal) GPU. The 
path-traced reference images were computed on two Intel Xeon E5- 
2680v3 CPUs (24 cores / 48 threads in total). We also experimented 
with GPU path tracing, which, however, was close to 2x slower than 
the dual-CPU setup due to the incoherent memory accesses inherent 
to tracing long light paths through clouds. We therefore compare 
our method to the faster CPU baseline. All images were rendered at 
1280 x 720 resolution. The rendering time depends linearly on the 
number of pixels and can be controlled by adjusting the resolution. 


Comparison to Reference. Figure 7 compares our solution to a 
path-traced reference on a number of different clouds in varying 
lighting configurations. All images in the figure use a sun-sky model; 
see the supplementary materials for results with HDR environment 
maps. In most cases, our neural network predicts the radiance values 
accurately, producing images that are nearly indistinguishable from 
the reference; we recommend studying the differences using the 
supplemented interactive viewer. The highest errors appear in the 
back-lit CLoup 4 (bottom right), where the network produces an 
image with slightly less contrast. This is partly due to the fact that 
our data set of 80 clouds consists mostly of cumulus clouds, whereas 
the problematic cloud is closer in shape to a stratus. 

Since both methods are Monte Carlo estimators, we can quantify 
their efficiency in reducing noise using time to unit variance (TTUV), 
defined per image as the product of render time and mean pixel 
variance. The TTUV can also be interpreted as the variance after 
rendering for one unit of time. Computing the ratio of the TTUV of 
the reference and the TTUV of our method yields the speedup we 
achieve over path tracing, which is reported alongside the incurred 
bias (of converged results) in Table 1. Our approach is up to three 
orders of magnitude faster than the path tracer under directional il- 
lumination, meaning that a high-quality image produced by the path 
tracer within a day can be synthesized using our method in mere 
seconds. Having to integrate numerically over multiple light direc- 
tions reduces our speedup since much of the noise comes from the 
directional sampling. Hence the sun-sky lighting and the relatively 
low-dynamic-range envmap render generally slower; we discuss 
a possible remedy in Section 7. In the case of directional illumina- 
tion, the speedups are lower in back-lit configurations, where most 
energy is contributed by single scattering that we do not accelerate. 
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Table 1. Speedup over path tracing and bias of converged images computed 
with our technique for 5 different clouds under directional illumination, a 
sun-sky model, and environment maps with medium dynamic range. For 
directional light sources we compare against flux-limited diffusion [Koerner 
et al. 2014] and achieve one to two orders of magnitude lower bias. 


Bias FLD Bias RPNN (Ours) Speedup RPNN (Ours) 


Scene Light dir. Dir. illum Dir. illum Sun-sky  Envmap Dir. illum. Sun-sky Envmap 


Side 2.32e-03 2.02e-03 1.21e-03 417 x 19.1 x 
CLoup 1 Front 1,54e-03 26.9 X 18.5 X 
Bac 5.46e-04 WO 100x 47x 
Side 1.13e-03 18.7 X 10.6 x 
CLoup 2 Front 1.32e-03 28.7 XxX 5.9 Xx 
Bac! 38 X 13.0 x 36 x 
Side 1.31e-03 19.7 x 63 X 
CLoup 3 Front 1.46e-03 13:6) x BRIEK 


Bac 


Side 
CLoup 4 Front 


} 7.69e-03 | 62x 53x 30x 


2.04e-03 1.24e-03 2591X] 12.1 x 
1.75e-03 158003 170603 15.8 x 9.9 Xx 


Bac 6.35e-03 4.94e-03  6.06e-03 Soy 155 x TB] 
Side 2.08e-03  8.41e-03  1.61e-03 250x 9.8 x 
Croun5 Front 1.32e-03 1.16e-03 36.6 X 49X 
Bac 5.59e-03 5.18e-04 30x 23x 29x 


Comparison to Flux-limited Diffusion. In Figure 8, we compare 
also to FLD. Unlike our method, the diffusion theory-based FLD sim- 
ulates all orders of scattering—including single scattering—with the 
inherent assumption of isotropic scattering. This prevents correctly 
synthesizing the silverlining effect, which stems from low-order, 
highly forward Lorenz-Mie scattering. While thin clouds are prob- 
lematic for our method (due to a rather small number of those in 
the training set), FLD produces even worse results, overestimating 
the cloud brightness (middle row) and significantly changing the 
visual look of the cloud (see the supplementary material). In Table 1, 
we report the root-mean-square error on converged images of FLD 
and our method as bias. Our method outperforms FLD by one to 
two orders of magnitude in this regard. 


Impact of Phase Function. In Figure 9, we compare renderings with 
Lorenz-Mie and Henyey-Greenstein phase functions synthesized by 
path tracing and our method. Being an approximation, our method 


Lorenz Mie 


Henyey-Greenstein 


(a) PT reference 


(b) RPNN 


Fig. 9. Visual comparison of Lorenz-Mie scattering and the Henyey- 
Greenstein approximation rendered with path tracing and our method 
trained on data simulated with the corresponding phase function. Note that 
the (vertical) difference between the two phase functions is larger than the 
(horizontal) difference between the reference and our solution. 
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— Ours [10 x 2 x 200,3 x 200] 

—— Shallow wide MLP [2 x 400, 3 x 200] 
— Deep narrow MLP [23 x 200] 


—— Regular Stencil [5 x 5 x 5] 
—— Regular Stencil [5 x 5 x 9] 


— Data-Driven Stencil [7 x 7 x 7] 
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Iterations Iterations 


Fig. 10. Convergence of validation error for different network designs (left) 
and various stencil shapes with our network architecture (right). 


does not match the path tracer exactly, however, the visual difference 
is smaller than between the two phase functions. Previous methods 
for fast rendering of clouds heavily approximated or ignored either 
the cloud geometry, or the shape of Lorenz-Mie phase function. Our 
method is the first to produce high-quality results within minutes. 


7 DISCUSSION 


In this section, we compare our network design to three other ar- 
chitectures that we tested, analyze the impact of certain hyper- 
parameters, and discuss limitations of our current approach. 


Comparison to Vanilla MLP and CNN. We experimented with a 
vanilla MLP with different numbers of layers and nodes. Figure 10 
compares our approach to a shallow wide MLP with 5 layers con- 
taining [400, 400, 200, 200, 200] nodes, and to a deep narrow MLP 
with 23 layers each containing 200 nodes. The deep narrow MLP 
has the same number of layers as our design: we use 10 blocks, each 
with 2 layers, and 3 layers at the end (each layer contains 200 nodes). 
The main difference to our design is that we feed the hierarchical 
descriptor to the other two variants all at once; the entire hierarchy 
is input to the first layer. The other difference is the presence of 
residual connections in our design. All compared MLPs use about 1.4 
million trainable parameters. As demonstrated by the convergence 
curves, our architecture learns faster and converges to a lower error. 
This highlights the benefit of the progressive feeding that provides 
means to better adapt to signals at different frequency scales. 

We also tested a CNN; please see the supplementary document for 
the description of the architecture. We did not see quality improve- 
ments over our approach while incurring a significant performance- 
and stability hit. This may be because the size of the input is rela- 
tively small compared to applications where CNNs excel. Further- 
more, the spatial structure which CNNs are designed for is fully 
captured by the input stencil. We conjecture that if the descrip- 
tor was sufficiently larger, an appropriately designed CNN might 
yield a quality improvement over our network at the expense of 
additional time spent constructing descriptors, which is already the 
computational bottleneck of our approach. 


Stencil Size. We tested several stencils and their sizes. Figure 10 
compares the convergence of the validation error with stencils of 
varying resolution and a data-driven stencil with points distributed 
according to the mean fluence distribution around the shading lo- 
cation; please see the supplementary material for details. Contrary 
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to our intuition, the fluence-based approach never exceeded the 
performance of the straightforward regular stencils. We chose the 
5x59 configuration as it strikes a good balance between accuracy 
and the cost of querying the density values and number of trainable 
parameters in the network. 


Including Direct In-scattering in Predictions. We compared the con- 
vergence error of learning all in-scattered radiance Ls (green curve) 
and only the indirect compo- 
nent L; (blue curve). The net- 
work struggles with learning 
the full Ls due to the very 
high-frequency nature of direct 
illumination. This component 


Learning Ls 
Learning L; 


is largely responsible for the 
prominent silverlining effect in 


backlit configurations, and in- 


accurate predictions would be detrimental to the overall visual qual- 
ity. Since the variance of integrating the direct in-scattered radiance 
via Monte Carlo is relatively low, excluding Lg from the prediction 
helps attain high quality at tractable extra cost as shadow rays can 
be made coherent. 


Non-Cloud Shaped Objects. We tried to predict radiance on non- 
cloud shaped objects using one of our models we trained on clouds 
(see Figure 11 and the supplementary material). As expected, our 
model performs worse than on clouds, given that the training data 
is not representative of the rendered shapes. While general charac- 
teristics of light transport are present, the brightness of subsurface 
scattering is overestimated. These experiments highlight another 
limitation of our approach: our stencil—being unable to resolve de- 
tail at large distances—cannot capture distant hard shadows well. 
We do not expect a proper training set to fully alleviate this problem. 


(a) PT reference (b) RPNN (Ours) (c) PT (d) RPNN 


Fig. 11. Comparison of path-traced references and our method on the STAN- 
FORDBunny and the SpHEREBox scenes under directional illumination. On 
such shapes, which are radically different from the clouds that the network 
was trained on, the incurred bias is substantially higher. Furthermore, due 
to the coarse approximation of distant shapes by our stencil, hard shadows 
can not be well synthesized (bottom row). Please see the supplementary 
material for additional results. 
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(b) 4 SPP (c) 16 SPP 


Fig. 12. If rendered at low samples per pixel, images may suffer from noise 
due to the MC integration that is used for sampling visible points and 
estimating single scattering. The noise level increases when lighting the 
cloud by the surrounding environment (bottom). Our method requires 1-3 
seconds per SPP for images in our dataset. 


Environment Lighting. Since we estimate part of the transport 
using MC integration (distance sampling and single scattering), the 
rendered images may still suffer from a small amount of residual 
noise. Figure 12 compares the levels of noise with different numbers 
of samples per pixel (SPP). The noise is exacerbated by environmen- 
tal lighting, which requires multiple samples/predictions to cover 
the full sphere of directions. An interesting solution to avoid sam- 
pling the environment map would be to expand it into a suitable 
basis, e.g. spherical harmonics, and feed the expansion coefficient 
into the network, thereby training it to predict the radiance due to 
the full sky and ground. While fruitful, we believe this topic war- 
rants its own investigation and leave it for future work. The noise 
could also be removed by applying an a-posteriori denoiser. 


Retraining and Adaptation. Being data-driven, the performance 
of our method depends largely on how well the training set repre- 
sents the actual rendering scenario. That being said, our networks 
require ~12h of training on a single GPU. Tailoring their perfor- 
mance to a particular kind of cloud or specific cloud formations is 
thus relatively easy, provided that enough training data is available. 
Lightweight retraining could also be achieved by employing the 
idea of progressive networks [Rusu et al. 2016]. 


Temporal Stability. A key strength of our approach is its temporal 
stability stemming from the hierarchical descriptor and filtering. 
The accompanying video shows sequences with an orbiting camera 
and light sources, a cloud with changing density, and a cloud with 
changing shape, all of which are temporally stable. 


8 CONCLUSION 


We presented a novel approach for rendering atmospheric clouds at 
fast rates. Our radiance-predicting neural networks represent a new 
point in the spectrum of methods trading accuracy for speed. We 
achieve results that are nearly indistinguishable from the reference 
often faster or at comparable cost to previous approaches. The key 
and novel ingredient of our method is the progressive feeding of a 
hierarchical descriptor into the network. We further improve perfor- 
mance using standard techniques, such as residual connections and 
mip-mapping. Our method is temporally stable and could present 
a viable solution for applications such as asset design, future com- 
puter games, but also offline rendering if a small amount of bias is 
visually acceptable. We demonstrated that deep neural networks 
can be successfully applied to transport problems and hope that our 
work will stimulate further developments in that direction. 
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